Not only would a better theory of semantics help researchers detect objects and features which are natural to the AI, it would also help them check whether a given AI treats some feature of its environment or class of object as a natural cluster, and help researchers agree within provable bounds on what concept precisely they are targeting.
This part isn’t so clear to me. Why can’t I just look at what features of the world an AI represents without a theory of semantics?
I guess in that case I’d worry that you go and look at the features and come away with some impression of what those features represent and it turns out you’re totally wrong? I keep coming back to the example of a text-classifier where you find “”“the French activation directions””” except it turns out that only one of them is for French (if any at all) and the others are things like “words ending in x and z” or “words spoken by fancy people in these novels and quotes pages”.
This part isn’t so clear to me. Why can’t I just look at what features of the world an AI represents without a theory of semantics?
I guess in that case I’d worry that you go and look at the features and come away with some impression of what those features represent and it turns out you’re totally wrong? I keep coming back to the example of a text-classifier where you find “”“the French activation directions””” except it turns out that only one of them is for French (if any at all) and the others are things like “words ending in x and z” or “words spoken by fancy people in these novels and quotes pages”.